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Abstract 

We propose a semiparametric independent-component model for the intensity func¬ 
tions of a point process. When independent replications of the process are available, 
we show that the estimators are consistent and asymptotically normal. We study the 
finite-sample behavior of the estimators by simulation, and as an example of application 
we analyze the spatial distribution of street robberies in the city of Chicago. 
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1 Introduction 

Point processes in time and space have a broad range of applications, in diverse areas 
such as neuroscience, ecology, finance, astronomy, seismology, and many others. Exam¬ 
ples are given in classic textbooks like Cox and Isham (1980), Diggle (2013), Mpller and 
Waagepetersen (2004), Streit (2010), and Snyder and Miller (1991), and in the papers cited 
below. However, the point-process literature has mostly focused on the single-realization 
case, such as the distribution of trees in a single forest (Jalilian et ah, 2013) or the dis¬ 
tribution of cells in a single tissue sample (Diggle et ah, 2006). But situations where 
several replications of a process are available are increasingly common. Among the few 
papers proposing statistical methods for replicated point processes we can cite Diggle et 
al. (1991), Baddeley et al. (1993), Diggle et al. (2000), Bell and Grunwald (2004), Landau et 
al. (2004), Wager et al. (2004), and Pawlas (2011). However, these papers mostly propose 
estimators for summary statistics of the processes (the so-called F, G and K statistics) 
rather than for the intensity functions that characterize the processes. 

When several replications of a process are available, it is possible to estimate the inten¬ 
sity functions directly thanks to the possibility of “borrowing strength” across replications. 
This is the basic idea behind many functional data methods that are applied to stochastic 
processes which, individually, are only sparsely sampled (James et ah, 2000; Yao et ah, 
2005; Gervini, 2009). Functional Data Analysis has mostly focused on continuous-time 
processes; little work has been done on discrete point processes. We can mention Bouzas 
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et al. (2006, 2007) and Fernandez-Alcala et al. (2012), which have rather limited scopes 
since they only estimate the mean of Cox processes, and Wu et al. (2013), who estimate the 
mean and the principal components of independent and identically distributed realizations 


of a Cox process, but the method of Wu et al. (2013) is based on kernel estimators of 


the covariance function that are not easily extended beyond i.i.d. replications and time- 
dependent processes. The semiparametric method we propose in this paper, on the other 
hand, can be applied just as easily to temporal or spatial processes, and can be extended 
to non-i.i.d. situations like ANOVA models or more complex data structures like marked 
point processes and multivariate processes, even though we will not go as far in this paper. 

As an example of application we will analyze the spatial distribution of street robberies 
in Chicago in the year 2014, where each day is seen as a replication of the process. The 
method we propose in this paper fits a non-negative independent-component model for the 
intensity functions, and provides estimators of the intensity functions for the individual 
replications as by-products. We establish the consistency and asymptotic normality of the 
component estimators in Section [3l study their finite-sample behavior by simulation in 
Section [H and analyze the Chicago crime data in Section [5l 

2 Independent-component model for intensity process 

A point process A is a random countable set in a space S, where S is usually M for 
temporal processes and or for spatial processes (Mpller and Waagepetersen, 2004, 
ch. 2; Streit, 2010, ch. 2). Locally finite processes are those for which f] B) < oo 
with probability one for any bounded B C S. For such processes we can define the count 
function N{B) = #(A n B). Given a locally integrable function A : 5 —[0,oo), i.e. a 
function such that X(t)dt < oo for any bounded BOS, the process A is a Poisson 
process with intensity function X{t) if (i) N(B) follows a Poisson distribution with rate 
X{t)dt for any bounded BOS, and (ii) conditionally on N{B) = m, the m points in 
X n B are independent and identically distributed with density X{t) = X{t)/ X for any 
bounded BOS. 

Let Xb = a n i? for a given B. Then the density function of Xb aX xb = {ti, ■ ■ ■ , tm} 
is 


f{xB) = .. ,tm\m) 


( 1 ) 



Since the realizations of A^ are sets, not vectors, what we mean by ‘density’ is the following: 
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if M is the family of locally finite subsets of S, i.e. 


AA = {A C 5 : #{A Cl B) < oo for all bounded i? C 5} , 


then for any F C M 


OO n n 

P{Xb£F) = ^ - ^ F)f{{ti,...,tm})dti---dtm 

m=odB Jb 

^0 Jb r-i 


and, more generally, for any function h : M ^ [0, oo) 


OO n p 

E{h{XB)}='y] / ••• / h{{h,---,tm.})f{{ti,...,tm})dt 

Jb 


]^ * * * dt^^ 


A function h on J\f is, essentially, a function well defined on for any integer m which is 
invariant under permutation of the coordinates; for example, h{{ti 

Single realizations of point processes are often modeled as Poisson processes with fixed 
As. But for replicated point processes a single intensity function A rarely provides an ade¬ 
quate fit for all replications. It is more reasonable to assume that the As are subject-specihc 
and treat them as latent random effects. These processes are called doubly stochastic 
or Cox processes (Mpller and Waagepetersen, 2004, ch. 5; Streit, 2010, ch. 8). We can 
think of a doubly stochastic process as a pair (X, A) where A|A = A is a Poisson process 
with intensity A, and A is a random function that takes values on the space F of non¬ 
negative locally integrable functions on S. This provides an appropriate framework for the 
type of applications we have in mind: the n replications can be seen as i.i.d. realizations 
(Ai, Ai),..., {Xn, An) of a doubly stochastic process {X, A), where X is observable but A 
is not. In this paper we will assume that all AjS are observed on a common region B of S; 
the method could be extended to AjS observed on non-conformal regions Bi at the expense 
of a higher computational complexity. 

For many applications we can think of the intensity process A as a finite combination 
of independent components, 

p 

Mt) = ^Uk(l)kit), (3) 

k=l 

where (pi,... ,(j)p are functions in F and Ui,... ,Up are independent nonnegative random 
variables. This is the functional equivalent of the multivariate Independent Component 
decomposition (Hyvarinen et ah, 2001). For identifiability we assume that J^(j)k(t)dt = 1 
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for all k, i.e. that the (/>^s are density functions. Then A{t)dt = Y%=i and, if we 
define S = X]fc=i ^k and Wk = Uk/S, we have 

p 

A{t) = SA{t), with A{t) = ^ WkMt)- (4) 

k=l 

The intensity process A(t), then, can be seen as the product of an intensity factor S 
and a scatter factor A{t) which is a convex combination of the Conditionally on 

A{t) = A(t) the count function N(B) has a Poisson distribution with rate S = s, and 
conditionally on N{B) = m the m points in Xb are independent identically distributed 
realizations with density A(t) which is determined by the W^s. In general S and W are 
not independent, but if the UkS are assumed to follow independent Gamma distributions, 
say Uk ~ Gamma(Q:fc,/3) with a common /3, then S and W are independent with S ~ 
Gamma(^ Ofc,/?) and W ~ Dirichlet(ai,..., Op) (Bilodeau and Brenner, 1999, ch. 3). 

The marginal density of with a slight abuse of notation in writing xb = (m,t), 
can be expressed as 


f{xB) 


f{m, t, s, w) ds dw 

f{t I m, w)/(m I s)f{s,w) ds dw 


( 5 ) 


with 


m p 


/(t I m,'w) = Wk4>k{tj)- 


j=i k=i 

Since A(t) is a mixture of the 4>i^s, for computational convenience we introduce indicator 
vectors yi,... ,ym such that, for each j, yjk = 1 for some k and yjk' = 0 for all k' ^ k, 
with P{Yjk = 1 I m, w) = Wk- In words, yj indicates which the point tj is coming from. 
Gonditionally on m and w, the yjS are independent Multinomial(l, w). Then, collecting 
all y^s into a single y for notational simplicity, (l5|) can also be written as 


fixB) = 


y^/("i,t,y,s,w) ds dw 
y 

^/(t I m,y)/(y | m,w)/(m | s)f{s,w) ds dw 


with 


m p 


/(t I m,y) = 

j=l k=l 
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and 


m p 

/(y I "i,w) = jj wl^\ 

j=i k=i 

Although the y^s may at first look like an unnecessary complication, they actually simplify 
the EM algorithm and are also useful for data analysis, as we will show in Section [5l 

To estimate the </>fcS we use a semiparametric approach: we assume they belong to 
a family of functions B with non-negative basis functions I3i{t),... ,Pq{t) defined on the 
region of interest B; for example, we can use B-splines for temporal processes and tensor- 
product splines or radial basis functions for spatial processes. Then 4>k{t) = c^f3{t) with 
Cfc > 0 to guarantee that <i>k{x) > 0. Note that the specific nature of 5, i.e. whether the 
process is temporal or spatial, only plays a role here, in the specification of B] for all other 
purposes we make no distinctions between temporal and spatial processes. 

The model parameters are estimated by penalized maximum likelihood. In addition to 
the component coefficients ci,..., Cp, the distribution of S and W will depend on certain 
parameters that we will collect in a vector rj. The whole model, then, is parameterized by 
0 = (ci,..., Cp, 77 ) and the parameter space is 0 = C x where 

C = {(ci,... ,Cp) : a^Cfc = 1 , Cfc > 0 for all k] , 

with a = (3{t)dt, and T-L is the space of the rjs. Then, given n independent realizations 

xbi, ■ ■ ■ ,XBn with XBi = {Ui, ..., the likelihood function is 

n 

L{e) = Wf{xB^■^e) ( 6 ) 

1=1 

with f{xB', 0) given by dH). Since the dimension of B is typically large, a roughness penalty 
has to be added to ([H]) to obtain smooth ij^^s. We use a penalty of the form —QP{0) with 
C > 0 and P(0) = Yl’k=i 9{(l>k)^ where 

g{(j)) = [ dt, 

Jb 

H denotes the Hessian and \-\p the Frobenius matrix norm. So for a temporal process 

5 (</>) = and for a spatial process g{4>) = f{(^f + 

which are quadratic functions in the coefficients c^. Then 6 maximizes 

p„(0)=n-ilogL(0)-CP(0) (7) 

among 0 E 0 for a given smoothing parameter C. The smoothing parameter as well as 
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the number of components p can be chosen by cross-validation, maximizing 

n 

CV(C,p) = J^logf(xBi;^(-i)), (8) 

i=l 

where 0(_i) denotes the penalized maximum likelihood estimator for the reduced sample 
with XBi deleted. An EM algorithm for the computation of 0 is described in detail in the 
Supplementary Material. 

3 Asymptotics 

The asymptotic behavior of 0 as n —)• oo can be studied via standard empirical-process 
techniques (Pollard, 1984; Van der Vaart, 2000), since d?]) is the average of independent 
identically distributed functions plus a non-random roughness penalty, as in e.g. Knight 
and Fu (2000). 

In principle two types of asymptotics may be of interest: the “nonparametric” asymp¬ 
totics, where the dimension of the basis space B grows with n, and the “parametric” 
asymptotics, where the true (/>^s are assumed to belong to B and then the dimension of B 
is held fixed. We will follow the second approach here, which is simpler and sufficient for 
practical purposes and has been followed by others (e.g. Yu and Ruppert, 2002, and Xun 
et ah, 2013) in similar semiparametric contexts. 

The constraints of the space C introduce some complications in an otherwise standard 
asymptotic analysis. We will follow the approach of Geyer (1994). Proofs of the results 
presented here can be found in the Supplementary Material. The first result of this section, 
consistency of the estimator, is essentially a consequence of model identifiability. It is not 
obvious that model ([3]) is identifiable, so this is proved first in the Supplementary Material. 

Theorem 1 If the smoothing parameter goes to zero as n ^ oo, either determin- 

'' P 

istieally or in probability, then On 0o- 

As explained in Section [51 the parameter space 0 is of the form C xH with an % that 
we will assume convex and open. For example, for the model with independent Gamma 
scores, rj = (oi,..., Op, /3) and H = (0, -|-oo)^+^. So H is not a problem for the asymptotics 
because it does not contribute active constraints. The problem is the space C, which is a 
convex but closed set. In particular, the non-negativity constraints create some unusual 
asymptotics; for example, if a given co,fcj is zero, it is clear that y/n{cn^kj — co,kj) cannot 
be asymptotically normal since it is a nonnegative quantity for all n. 

To handle these constraints we need to introduce the notion of tangent cone. Let r and 
d be such that H and 0 C The tangent cone of 0 at Oq is the set of all directions 
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and limits of directions from which 9q can be approached with sequences that stay inside 
0. If 00 is in the interior of 0 it can be approached from any direction, so the tangent cone 
is the whole in that case. But if Oq is on the border of 0 it can only be approached 
from certain directions; for example, if co,ii = 0 then the approaching sequences must 
necessarily satisfy cn > 0 to stay inside 0. An in-depth treatment of tanget cones can be 
found in Hiriart-Urruty and Lemarechal (2001, ch. A.5). The tangent cone of the product 
of convex sets is the product of the respective tangent cones, so the tangent cone of 0 at 
00 is Aq = T X M’’, with 

T = {(vi,. .. ,Vp) : a'^Vfc = 0, Vkj > 0 for j E 4, /c = 1,... ,p} , (9) 

and 4 = {j : = 0}. 

The following theorem gives the asymptotic distribution of 0^, in a form as explicit as 
can be given for a general 00- Let Fq be Fisher’s Information Matrix, 

Fo = Ee,{Vlogf{XB-,eo)Vlogf{XB-,eof} 

= -Ee,{V^logf{XB;eo)}, 

where the derivatives of log f{xB', 0) are taken with respect to the parameter 0. 

Theorem 2 If k as n ^ oo, either deterministically or in probability, then 

^(0„ - 0o) ^ (5(Z), where S{Z) is the maximizer of 

IT(5) = {Z- KXP{eo)}^S - 

over S E Aq, and Z ~ N(0,Fo). 

Although a closed expression for ^(Z) in Theorem [2] is not available in general, ^(Z) 
is easy to simulate: one generates M Monte Carlo samples Zi,... ,Zm from a N(0 ,Fq) 
distribution and then computes S(Zi) for each Zj, which involves a simple quadratic opti¬ 
mization problem with linear constraints. But in the particular case where all component 
coefficients CQ^kj are strictly positive, we can be more explicit: write a 5 E Aq as 5 = (^i, ^ 2 ) 
with E T and <^2 E M'’; since there are no inequality constraints in Q, the only restric¬ 
tion for (5i is that (4 <8) a^)<5i = Op. The matrix 4 (g) is p x pq of rank p, so there exists 
an orthogonal pq x {pq — p) matrix F such that (4 ® a^)r = O and =r^ for some 
unconstrained ^ E Let 

\ O Ir ) 
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Then 


W{S) = T? 


i 

82 




i 

82 


with Z = A{Z — kVP{Oo)} and Fq = AFqA^ , so 


S{Z) = A 


m 

62m 


with 


^(Z) 

62m 


= Fo'Z. 


Note that Z ~ N(/x,Fo) with /x = —kAVP{6q), so (^(Z),d 2 (Z)) has a non-singular 
N(FoV,Fo') distribution but S{Z) itself has a singular N(AFq /i, AFq A^) distribution. 
Nevertheless, the diagonal elements of AFq PJ' jn can be used as variance estimators to 
construct, for example, confidence intervals for the elements of Q. 


4 Simulations 

We ran some simulations to study the finite-sample performance of the estimators. We con¬ 
sidered two models like ([3l), both with two components: Model 1 had i;/>i(t) = exp{—100(t — 
.3)^}/.177 and (^ 2 ^) = exp{—100(t — .7)^}/.177 as components, for t E [0,1], which are 
two peaks with practically no overlap; Model 2 had (/>i(t) = exp{—20(t — .3)^}/.385 and 
4'2i't) = exp{—20(t — .7)^}/.385, for t € [0,1], which are also unimodal but flatter functions 
with more overlap. The component scores Ui and U 2 had lognormal distributions with 
E{Ui) = 30, E{U 2 ) = 20, V{Ui) = 10 and V{U 2 ) = 1 for both models. Two sample sizes 
were considered, n = 50 and n = 150. 

For estimation we used cubic i3-sphnes for the with K = 5 and A = 10 equispaced 
knots, and Gamma-distributed component scores. The estimating models are somewhat 
different from the data-generating models, which allows us to investigate the robustness 
of the estimators to model misspecification. The smoothing parameter was chosen by 
five-fold cross-validation; for comparison we also computed estimators with the optimal 
Copt that minimizes the mean squared error, which gives us a lower bound that cannot be 
attained in practice but serves as a benchmark to judge the adequacy of five-fold cross- 
validation as a method for choosing C- 

Table [T] summarizes the results. For each estimator we report bias = {J[E{(p{t)} — 
std = {/ E{[^{t) — and rinse = (f E[{(j){t) — 4>{t)}‘^]dty/‘^, 

where the expectations have been approximated by 500 Monte Carlo replications. We do 
not report Monte Carlo standard errors on the table, but, for example, the hrst mean 
squared error reported, .732^ = .536, has a Monte Carlo standard deviation of .001, which 
gives a 95% conhdence interval (.534, .538) for the mean squared error and (.730, .733) 


n = 50 


Model 1 


n = 150 


K = 5 K = 10 K = 5 K = 10 


Estimator 

bias 

std 

rmse 

bias 

std 

4>i cv 

.73 

.07 

.73 

.30 

.14 

(j )2 cv 

.81 

.08 

.82 

.35 

.17 

(j)i opt 

.73 

.04 

.73 

.18 

.07 

4>2 opt 

.81 

.04 

.81 

.20 

.09 



bias 

std 

rmse 

bias 

std 

<('1 cv 

.31 

.07 

.32 

.19 

.08 

4>2 CV 

.33 

.08 

.34 

.22 

.10 

(l)l opt 

.30 

.08 

.31 

.09 

.07 

4>2 opt 

.32 

.07 

.33 

.07 

.08 


Table 1: Simulation Results. Bias, standard 
independent component estimators. 


rmse 

bias 

std 

rmse 

bias 

std 

rmse 

.33 

.73 

.09 

.73 

.17 

.12 

.20 

.39 

.82 

.09 

.82 

.19 

.14 

.24 

.19 

.73 

.03 

.73 

.14 

.07 

.16 

.22 

.82 

.03 

.82 

.15 

.08 

.17 

Model 2 






rmse 

bias 

std 

rmse 

bias 

std 

rmse 

.20 

.31 

.05 

.32 

.19 

.04 

.19 

.24 

.33 

.07 

.34 

.22 

.05 

.23 

.11 

.30 

.05 

.30 

.08 

.06 

.10 

.11 

.32 

.04 

.32 

.07 

.07 

.10 


deviation and root mean squared error of 


for the root mean squared error, so the quantities in Table [T] are accurate up to the two 
reported decimals. We see in Tabled] that the comparative behavior of the estimators with 
respect to sample size and number of knots is similar for both models and both components, 
although (j)i is easier to estimate than 1^2 because it is the dominant component. We see 
that ten knots produce much better estimators than five knots, even for a sample size as 
small as 50. For a fixed sample size we see that increasing the number of knots reduces the 
bias and increases the variance, as expected, but the gains in bias amply compensate for the 
increases in variance, so the overall mean squared errors are much smaller. The estimators, 
then, successfully “borrow strength” across replications, and it is best to err on the side 
of using too many basis functions rather than too few. Overall, the performance of the 
cross-validated estimators is not much worse than that of the optimal estimators, so five¬ 
fold cross-validation is to be an adequate method for choosing the smoothing parameter, 
although there is some room for improvement. 

It is also of interest to compare the behavior of the new method with the method of 
Wu et al. (2013). Direct comparison of the component estimators is not possible because 
the method of Wu et al. (2013) is not based on a non-negative decomposition for the 
process A(t) like model (l3|). Briefly, the method of Wu et al. (2013) works as follows: first, 
= Fl{A(t)} and p{s,t) = Cov{A(s), A(t)} are estimated by kernel smoothers, and the 
eigenfunctions of p, which can be negative, are computed; then the individual densities 
Aj are estimated using the eigenfunctions of p(s, t) as basis, truncating and renormalizing 
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Model 1 



n = 

50 

n = 

150 

Estimator 

intensity 

density 

intensity 

density 

IC-based, cv 

14 

.26 

11 

.18 

IC-based, opt 

10 

.17 

10 

.15 

PC-based, opt 

13 

.18 

12 

.14 



Model 2 


IC-based, cv 

7.0 

.10 

6.5 

.09 

IC-based, opt 

6.8 

.10 

6.7 

.10 

PC-based, opt 

9.6 

.12 

9.3 

.11 


Table 2; Simulation Results. Root mean squared errors of intensity and density function 
estimators. 

the AjS if necessary to make them non-negative; finally, the intensities A* are estimated 
as Aj = rriiXi, where is the number of observations for replication i. Since we cannot 
compare the component estimators directly, we will compare the intensity estimators and 
the density estimators. The method of Wu et al. (2013) has three main tuning parameters: 
the bandwidths of the kernel smoothers and the number of components to include in 
the expansions of the AjS. Wu et al. (2013) discuss a number of strategies to choose 
these parameters. Here we use the optimal Gaussian bandwidths for the kernel smoothers 
(Silverman, 1986, ch. 3.4 and 4.3), which produce reasonably smooth estimators of and 
p{s, t) in our simulations, and instead of choosing the number of components p using Wu et 
al.’s suggestions we simply report the estimation errors for the optimal p, which happens 
to correspond to p = 1. Note that Wu et al.’s model has a mean while ([31) does not, so a 
one-principal-component model has the same dimension as a two-independent-component 
model. 

Table [2] shows the estimated root mean squared errors 11'^* “ 

{E{Y17=i ll^i “ based on 500 Monte Carlo replications. In fact, the same 

simulated data and the same independent-component estimators as in Tabled] were used. 
We only report the results for the ten-knot estimators, since they were uniformly better 
than the five-knot estimators. Again, we report the errors of both the cross-validated and 
the optimal independent-component-based estimators. Note that for Wu et al.’s principal- 
component-based estimators we are choosing the optimal p, so the errors observed in prac¬ 
tice will be somewhat higher than those reported in Table [2l Unlike the results in Table 
dl the results in Table [2] do change with the model. For Model 1 the results are mixed, 
depending on the sample size and on whether we are estimating the density function or the 
intensity function. But for Model 2 the independent-component-based estimators, even the 
cross-validated ones, clearly outperform Wu et al.’s principal-component-based estimators. 
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Figure 1: Chicago Street Theft, (a) Location of reported incidents in the year 2014. (b) 
Kernel density estimator of the data in (a). 

5 Application: Street Theft in Chicago 

In this section we analyze the spatial distribution of street robberies in Chicago during 
the year 2014. The data was downloaded from the City of Chicago Data Portal, a very 
extensive data repository that provides, among other things, detailed information about 
every reported criminal incident in the city. The information provided includes type, 
date, time, and coordinates (latitude and longitude) of the incident. Here we will focus 
on crimes typified as of primary type “theft” and location “street”. There were 16,278 
reported incidents of this type between January 1, 2014 and December 31, 2014. Their 
locations cover most of the city, as shown in Figure [T]( a). 

Investigating the spatial and temporal distribution of crime is important because, as 
noted by Ratcliffe (2010), crime opportunities are not uniformly distributed in space and 
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time, and the discovery and analysis of spatial patterns may help better understand the 
role of geography and opportunity in the incidence of crime. The geographical analysis of 
crime is not new and goes back at least to Guerry and Quetelet in the early 1800s (Friendly, 
2007), but technological and data limitations hampered earlier attempts at crime mapping. 
Only in the last few decades have software and hardware developed to the point of allowing 
efficient crime mapping, and only in the last few years have extensive crime data repositories 
like the City of Chicago Data Portal become available. 

Awareness of the geographical and temporal distribution of crime is necessary, for ex¬ 
ample, for efficient allocation of crime-prevention resources. Sometimes even the most basic 
spatial analysis reveals unexpected results. Figure HKb) shows a kernel-density estimator 
for the points in Figure [T](a). Those familiar with the Chicago area will immediately notice 
that the mode occurs at the neighborhoods of Wicker Park and Pulaski, an entertainment 
area with high pedestrian traffic that is not typically associated with crime and violence 
in people’s mind. But it is well known to criminologists that street robbery tends to con¬ 
centrate not on deprived neighborhoods per se but on places that they denominate “crime 
attractors”, areas that “bring together, often in large numbers, people who carry cash, 
some of whom are distracted and vulnerable” (Bernasco and Block, 2011). 

We fitted independent component models for these data, using the 365 days as repli¬ 
cations. We considered ps ranging from 2 to 5 and component scores with Gamma distri¬ 
bution. As basis family B for the components we used normalized Gaussian radial kernels 
with 49 uniformly distributed centers in the rectangle [—87.84, —87.53] x [41.65,42.03], the 
smallest rectangle that includes the domain of interest B, the city of Chicago. The smooth¬ 
ing parameter was chosen by five-fold cross-validation for each p. The cross-validated 
criteria for p = 2,..., 5 were 179.37, 177.73, 186.04 and 185.05, respectively, so we chose 
the model with four components. For this model the d^s were 4.95, 4.38, 4.04 and 3.51, 
and /? was 2.70, so the respective expected values of the UkS were 13.37, 11.85, 10.93 and 
9.50. The overall expected number of robberies per day in the city is then 45.64. 

Contour plots of the four components are shown in Figure [2j We computed the asymp¬ 
totic variance estimators derived in Section [3] for the component coefficients and it turned 
out that many CkjS were indeed not significantly different from zero, but the plots obtained 
after filtering out these components were virtually identical to those in Figure [21 so they 
are not shown here. 

The components in Figure [2] are well-localized and easily interpretable. The first com¬ 
ponent has a mode on the border of Near North and Lincoln Park neighborhoods, and 
extends northwards towards Lakeview and Uptown. These are the most affluent neighbor¬ 
hoods in Chicago, normally not associated with crime in people’s perceptions, but the large 
number of affluent people on the streets make them ideal “attractors” for street theft. The 
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Figure 2: Chicago Street Theft. Contour plot of (a) first, (b) second, (c) third, and (d) 
fourth component. 
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second component has a mode on Washington Park and extends eastwards towards the 
University of Chicago campus, neighborhoods in the South side of the city that are well- 
known to be problematic; the proportion of households below poverty level in Washington 
Park is 39%, and the unemployment level is 23%. 

The third component is bimodal, with the main mode at West Town, roughly where 
the mode of the overall mean was (Figure [2Kb)), and a second mode at Woodlawn and 
South Shore neighborhoods. These are also neighborhoods with generally high levels of 
crime and poverty, although not as high as Washington Park. The fourth component is 
also bimodal. The main mode concentrates on the West side neighborhoods of Humboldt 
Park and West Garfield, and the second mode on the South side neighborhoods of West 
Englewood and Gage Park. These are the worst neighboorhoods in the city in terms of 
overall crime and poverty; for example, the proportion of households below poverty level is 
33% in Humboldt Park and 32% in West Engelwood, and the percentages of people with 
no high school diploma are 37% and 30%, respectively. These similarities in demographics 
suggest that the bimodality of the components is not an artifact of the estimators but a 
real feature of the data. 

Model ([3]) also provides estimators for the individual intensity functions, Xi{t) = Ylk=i 
Two examples are shown in EigureO Eigure[3](a) corresponds to November 8, the day with 
the largest first component score, and Figure[3Kc) to July 25, the day with the largest fourth 
component score. There was a total of 43 thefts in November 8, 35 of which were associated 
with the first component, as determined by the yijkS] specifically, for each incident tij we 
found kij = argmax^ Vijk-, which indicates what component tij is most strongly associated 
with. The corresponding estimator of the intensity function is shown in Figure EKb), which 
accurately reflects the distribution of the points in Figure [3Ka). In July 25 there was a 
total of 72 street robberies, 17 of which were associated with the first component, 17 with 
the second, 0 with the third and 38 with the fourth. The corresponding intensity function 
is shown in Figure EKd) , and it is again an accurate representation of the distribution of 
the points in Figure [3Kc). If we were using individual kernel smoothers for the intensity 
functions, with only 43 or even 72 data points we would not be able to obtain estimators 
that are smooth and fine-detailed at the same time, like those in Figure [3l a small band¬ 
width would produce irregular estimators and a large bandwidth would produce featureless 
estimators. 

An analysis of the component scores also reveals interesting patterns. It is known that 
some types of crime tend to follow seasonal patterns; violent crimes in particular tend to 
increase during the summer (Field, 1992). Figure 0] shows that this is indeed the case for 
street theft in Ghicago. Even though there is a large variability, the seasonal trends are 
plain to see, in particular for the second component which shows a very clear peak in July. 
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Figure 3; Chicago Street Theft. Incidents reported in (a) November 8 and (b) July 25 
of 2014. Corresponding intensity estimators based on a four-component model, for (b) 
November 8 and (d) July 25. 
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Figure 4; Chicago Street Theft. Daily component scores and lowess smoother (solid line) 
for (a) first, (b) second, (c) third, and (d) fourth component. 
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Figure 5: Chicago Street Theft. Kernel density estimators of the distribution of time of 
reported incidents, for incidents associated with (a) first, (b) second, (c) third, and (d) 
fourth component. 

The hrst and third components also reach their maxima in July, although their curves 
are flatter. The fourth component, on the other hand, has two local peaks in April and 
September. Whether these peaks are systematic across the years may be established by 
analyzing data from previous years, since the City of Chicago Data Portal has daily crime 
data going back to 2001, but we will not do that here. 

Finally, we investigated the distribution of the time of the incidents. Figure [5] shows 
kernel density estimators of the time of robbery for each component. Since the minimum 
always occurs at 5am, for better visualization we shifted the points between 0 and 4am to 
the other end by adding 24, so e.g. Sam became 27. We see that the largest number of 
incidents tend to occur around 8 pm for all regions, but while the distribution is strongly 
unimodal for the North side neighborhoods (Figure [5j a)), for the other regions there are 
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clear plateaus earlier in the day (Figures [5^b-d)). These plots, however, may hide seasonal 
variations like those seen in Figure HI A more thorough analysis of these variations would 
need models that incorporate covariates, but this is beyond the scope of this paper. 
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